Database-Driven Mathematical Character Recognition
نویسندگان
چکیده
We present an approach for recognising mathematical texts using an extensive LTEX symbol database and a novel recognition algorithm. The process consists essentially of three steps: Recognising the individual characters in a mathematical text by relating them to glyphs in the database of symbols, analysing the recognised glyphs to determine the closest corresponding LTEX symbol, and reassembling the text by putting the appropriate LTEX commands at their corresponding positions of the original text inside a LTEX picture environment. The recogniser itself is based on a novel variation on the application of geometric moment invariants. The working system is implemented in Java.
منابع مشابه
Towards a Parser for Mathematical Formula Recognition
For the transfer of mathematical knowledge from paper to electronic form, the reliable automatic analysis and understanding of mathematical texts is crucial. A robust system for this task needs to combine low level character recognition with higher level structural analysis of mathematical formulas. We present progress towards this goal by extending a database-driven optical character recogniti...
متن کاملA Database of Glyphs for OCR of Mathematical Documents
Automatic document analysis tools for mathematical texts are necessary to enlarge the pool of mathematical knowledge available in electronic form. However, development of such tools is currently hindered by the weakness of optical character recognition systems in dealing with the large range of mathematical symbols and the often subtle but important distinctions in font usage in mathematical te...
متن کاملExtraction of Logical Structure from Articles in Mathematics
We propose a mathematical knowledge browser which helps people to read mathematical documents. By the browser printed mathematical documents can be scanned and recognized by OCR (Optical Character Recognition). Then the meta-information (e.g. title, author) and the logical structure (e.g. section, theorem) of the documents are automatically extracted. The purpose of this paper is to show the ex...
متن کاملTrie-Lexicon-Driven On-line Handwritten Japanese Disease Name Recognition
This paper describes a lexicon-driven approach to on-line handwritten Japanese disease name recognition using a time-synchronous method. A trie lexicon is constructed from a disease name database containing 21,713 disease name phrases. It expands the search space using time-synchronous method and applies the beam search strategy to search segmentation candidate lattice constructed based on prim...
متن کاملCombining Prediction and Recognition to Improve On-Line Mathematical Character Recognition
This paper describes methods to increase the accuracy of mathematical handwriting analysis by using context information. Our approach is based on the assumption that likely expression continuations can be derived from a database of mathematical expressions and then can be used to rank the candidates of isolated symbol recognition. We present how predicted continuations for an expressions are de...
متن کامل